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Abstract. We present an efficient and stable numerical ansatz for solving a 
class of integro-differential equations. We define the class as integro-differential 
equations with increasingly smooth memory kernels. The resulting algorithm 
reduces the computational cost from the usual to TC(T], where T is the 
total simulation time and C (T) is some function. For instance, C (T) is equal to 
In T for polynomially decaying memory kernels. Due to the common occurrence of 
increasingly smooth memory kernels in physical, chemical, and biological systems, 
the algorithm can be applied in quite a wide variety of situations. We demonstrate 
the performance of the algorithm by examining two cases. First, we compare 
the algorithm to a typical numerical procedure for a simple integro-difFerential 
equation. Second, we solve the NIBA equations for the spin-boson model in real 
time. 
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1. Introduction 

A major problem in the study of quantum systems is the influence of the environment 
on its dynamics. A variety of techniques, such as the influence functional [l], 
nonequilibrium perturbation theory [HE], and the weak-coupling approximation [4], 
have been developed to address this issue. The general approach is to derive equations 
of motion which only involve the degrees of freedom of the system. When working with 
systems weakly coupled to a rapidly relaxing environment, one can make the Markov 
approximation, which results in a local in time, flrst-order differential equation. This 
Markovian master equation is relatively easy to solve either analytically or numerically. 
As soon as the restrictions of weak coupling or fast environmental relaxation are 
removed, however, integro-differential equations appear, which reflects the fact that 
the environment retains a memory of the system at past times. 

In particular, a number of interesting physical systems have integro-differential 
equations of motion with polynomially decaying memory kernels. For instance, 
the spin-boson model [5] and Josephson Junctions with quasi-particle dissipation or 
resistive shunts [6j have l/t^ asymptotic kernels when going through a dissipatively 
driven phase transition. The occurrence of polynomially decaying kernels is common 
in physics and chemistry because many open systems are connected to an ohmic 
environment. Polynomially decaying kernels appear in biological applications as well. 
[7] Due to the frequent appearance of this type of kernel and also others which satisfy a 
requirement we call increasingly smooth below, it would be beneflcial to have efficient 
numerical methods to handle the corresponding integro-differential equations. Such 
a method will be useful in applications in many discipHnes. In addition, the method 
will enable the simulation of memory-dependent dissipation when used in conjunction 
with other computational methodologies, such as matrix product state algorithms for 
open systems. [Sj [Qj fTOl fTT] 

In this paper, we give an efficient and stable numerical ansatz for solving 
integro-differential equations with increasingly smooth memory kernels. Using typical 
techniques, one incurs a computational cost of AT^ oc N^, where A is some constant, 
T is the total simulation time, and N is the number of time steps in the simulation. 
Thus, there is a considerable advantage in using a high-order approach to reduce 
the required value of N as much as possible. However, the computational cost of 
the algorithm described herein scales as TC (T), where C (T) is some function that 
depends on the form of the memory kernel. For example, C (T) is equal to InT for 
polynomially decaying kernels. Such a reduction represents a substantial advancement 
in numerically solving integro-differential equations since it allows for the efficient 
calculation of long-time behaviour. 

This paper is organized as follows: in section[2l we introduce the types of equations 
under consideration, deflne increasingly smooth, and present the numerical ansatz with 
a discussion of errors. We demonstrate the performance of the algorithm using two 
example integro-differential equations in section [31 First, we consider an integro- 
differential equation composed of a polynomially decaying kernel and an oscillating 
function. Comparing directly to a two stage Runge-Kutta method, we see a large 
improvement in the scaHng of the error as a function of computational cost. Second, 
we solve the noninteracting bHp approximation (NIBA) equations for the spin-boson 
model in real time. We show that one can rapidly obtain the full real time solution. In 
[Appendix A\ we discuss how to extend the algorithm to higher orders. In [Appendix B[ 
we outline the Runge-Kutta method we use as a comparison to the numerical ansatz. 
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In [Appendix C| , we give a simple derivation of the NIB A equations, showing the 
physical situation behind the appearance of an increasingly smooth memory kernel. 

2. Algorithm 

We want to be able to numerically solve linear integro-differential equations of the 
form 

^ = -IC f dt'a {t - t') e-<'-''))C'p (f) (1) 
ot Jq 

= - IC f dTa{T)e-^^^^IC'p{t-T) (2) 







where p{t) is some quantity of interest (like a density matrix) and a(r) is the memory 
kernel. IC, IC' , and C are time-independent operators. An equation of the form 
[2] appears in open quantum mechanical systems in the Born approximation to the 
full master equation [4j, some exact non-Mar kovian master equations [12], and in 
phenomenological memory kernel master equations. [131 [HI [15] For an arbitrary form 
of the memory kernel, it is necessary to compute the integral on the right-hand side 
of [2|at each time step in the numerical simulation. Thus, ignoring the error of the 
simulation, the computational cost scales as T^, where T is the total simulation time. 
This is prohibitively expensive in all but the shortest time simulations. 

On the opposite extreme is when the memory kernel has an exponential form 

a{t - t') = 7e-T(*-*') . (3) 

In this case, both functions responsible for evolving the integrand 

obey simple differential equations in t. This allows us to define a "history" 

H{t)^ f dt'a{t-t')e'^^'~''^K'p{t') (4) 







which obeys the differential equation 

Hit) = iK'pit) - -fH{t) - CH{t) . (5) 
By solving this local in time differential equation together with 

p{t) - -ICH(t) , (6) 

we can solve the integro-differential equation with a computational cost scaling as T. 
Cases in between these two extremes have a spectrum of computational costs scaling 
from r to T^. 

2.1. Increasing Smoothness 

We are interested in integro-differential equations of the form [2l with memory kernels 
which are increasingly smooth 



d_ 

dt 



a' it) 



a (t) 

The idea behind increasing smoothness will become clear below. More specifically in 
this paper, we will look at examples where the memory kernel decays polynomially 
outside some cut-off time 

= ( ocjL t>AT 
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where Q!at (t) is a bounded but otherwise arbitrary function and p is some positive 
real numberlj AT is a cut-off which allows us to ignore any transient behaviour in 
the memory kernel which does not satisfy the increasingly smooth condition [7l There 
will generally be some natural cut-off to the integral. For the overall scaling of the 
computational cost for large simulation times, we can ignore the resulting problem- 
dependent, constant computational cost due to the integral at times t < AT. 

Below we will give an algorithm that is specific to the polynomial kernellH Similar 
algorithms are possible for any kernel which gets smoother as the time argument gets 
larger. The computational speedup will depend on how the function gets smoother. 
For instance, the polynomially decaying kernel gets smoother as 
di/t" 

(9) 



dt 



1/tP 



P 
t 



which allows one to take larger integration blocks as t gets larger. In this case, one can 
take a grid with spacing S cc t to cover the integration region. We call the partition 
around a grid point a block. In this case, the number of blocks scales logarithmically 
with the total time of the simulation. Consider another form of the kernel, /? (t), that 
gets smoother even faster than polynomial, for example 



P{t) 



he- 



rn 



This will give an even smaller number of blocks (that have exponential size in t) needed 
to cover a given range of integration and thus a faster speedupl§l In this case, though, 
a simple truncation of the integral is sufficient, which is not the case for a polynomially 
decaying kernel. 



2.2. Blocking algorithm for a polynomially decaying kernel 

The algorithm is designed to take advantage of the fact that at each new time step 
in solving the integro-differential equation [21 we almost already have the integral on 
the right hand side. Thus, if we have p {t) at all the previous time steps and we group 
them together in a particular way, we should be able to do away with both storing 
all these previous values and the need for a full integration of the right hand side. In 
more concrete terms, we want to be able to evaluate the history integral 

/(T,AT)= / dTa{T)F{T,T) , (11) 

J AT 

with 

F(T,T) = e^^/CV(T-r), (12) 

in such a way that when T — > T + h m the integro-differential equation [21 all the 
pieces used to compute the integral can be recycled by evolving for a time step h and 
then added back together to get the new integral. This is just what is achieved for 
the exponential memory kernel [H but in that case it is very extreme, the old integral 
simply has to be multiplied by a factor and added to a new piece. Below we show 
how to evaluate the integral with a blocking scheme that requires only InT number 

I For the algorithm, p need not be greater than one. However, physical equations of motion, such 
as the NIBA equations at a = 1/2, have an additional factor in the kernel to ensure its integral is 
bounded. 

§ 1101 has two solutions. Only the one with bounded integral would be physically relevant. 
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of blocks to cover the integration region, can achieve any desired accuracy, and has 
blocks that can all be recycled when T T + h. Again, we emphasize that the scheme 
we give now is specific for polynomially decaying memory kernels. The basic scheme 
can be constructed for other memory kernels as well simply by choosing a block size 
such that the block size times the smoothness stays constant. 

Lets first rewrite the integral [TT] as a sum over K blocks of width 6i and Taylor 
expand the memory kernel: 

I (T, AT) = yl deF in + e,T)a {n + e) (13) 

'-<5,/2 



1=1 
K 

E 

2=1 



+<5x/2 



i5,/2 



deF [n + e, T) 



X {a{n) + a' {n)e + 0{e^)] 
The lowest order approximation to this integral is 

f-+<5./2 



K 

E"(-') / 

i=l 



deF in + e,T) 



(14) 



(15) 



This equation represents the fact that we will take some step size, h, to compute the 
integral of F over the block, but use some other variable step size to calculate the 
integral of the product of a and F. For the polynomial kernel, the whole procedure 
is based on choosing a block size that grows with increasing distance from the current 
time, 

6^ = bn , (16) 

or close to it, as shown in Figure [TJ We call b the block parameter. This function 
for the block size gives a constant value when multipHed by the smoothness [9] of the 
kernel. Assuming the function F is bounded by 1, the highest order error on each 
block is bounded by 



a' {n) 



+S,/2 



i5i/2 



deF {n + e,T)e 



< 



p b 



(17) 



pb 



1 



Thus, when choosing the blocks in this way, there is a constant ratio, pb/A, of the 
bound on the error to the bound on the integral. The bound on the highest order 
error of the whole integral is 

deF{n + e,T)e < 

'-<5,/2 

Roughly, we then expect the error to decrease at least linearly in the block parameter 
b. We discuss the errors in more detail below. 

If we perfectly choose the block positions, we can calculate the number of blocks 
required to cover the integration region. The first block has position (its midpoint) 



K 

E"' 



4(p - 1) ATP- 



(18) 



AT 

^ 2 



and given the position of block i, block z + 1 is at 



ti+i — ti 



b b 1 

+ 2^«+i - Y 



(19) 



(20) 
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Figure 1. Procedure for evolving the integral 1111 in time. We start with the 
integral represented as a sum over variable size blocks, with short times having 
constant block sizes and longer times having larger block sizes. We evolve for some 
amount of time (here shown as a large amount of time, but in practice this whole 
procedure is done discretely in the small time steps of the differential equation). 
Then we cleanse the blocks, grouping them together so long as their size is not 
too big for their location in time. 
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The n block is then at 
(1- 



(1 



AT. 



(21) 



Since T is arbitrary, the n*'' block does not necessarily cover the remaining area. But 
approximately, 

i„«r-^r + o(62) . (22) 



and the total number of blocks for the integral is 

'AT' 

III I 

K = \n] = 



In(^) 



In (1 - I) - In (1 



(23) 



In practice, K will always be bigger than [23l because of two factors. One is the finite 
step size, h, which forces the blocks to be an integral multiple of h. Two is that we 
are not able to take the complete integral and divide it up perfectly. The division 
into blocks has to happen as we solve the integro-differential equation. We will see in 
subsection 12.31 that neither of these factors poses a serious problem and the true value 
of K will be of the same order as [23l For small b (or large K) , we can simplify to 



K : 



VAT 



Within the algorithm, then, we need to keep track of the K integrals 

j'+Si/2 

h = / deF {n + e, T) , ,5, = bn 

J -Si/2 

which can be summed up with the weight a {vi) to get the full integral, 
the explicit form of F, the K integrals are 

r+S,/2 

dee-^^^'+'^JC'piT-T,~e). 

-S,/2 

When we increment T ^ T + h, we first need to fix the block size. Si 



(24) 

(25) 
Putting in 

(26) 
Bi. Thus, 



the blocks will no longer be exactly bn in width. As T 
integrals are easily updated by 

r+Bj2 

deF in + e, T) 

-Bi/2 

f+B^/2 ^ 



T + h. 



Ti + h and the 



-Bi/2 



deF in + e, T) 



(27) 



After evolving, Bi < bn, which is acceptable since the smaller the blocks the smaller 
the error. The block sizes have to grow eventually or else we will not get the logarithmic 
coverage. Each time we increment T we can check whether nearest neighbor blocks 
can be grouped. We can group them so long as 
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where r'"^'" is the midpoint of the new block. This is the "cleansing" stage of Figure 
[U and is discussed in more detail in subsection 12.31 When the condition on the new 
block size is met, we can simply add the blocks together 



To summarize, we defined a logarithmic covering of the integration region in terms 
of growing blocks. As we solve the integro-differential equation, we can evolve the 
blocks and perform further groupings to ensure approximately the same covering. This 
covering allows for a logarithmic speedup in computing the history and a logarithmic 
reduction in the memory necessary to store the history. At the same time, due to the 
increasing smoothness of the polynomial memory kernel, it has a controllable error. 
The algorithm can be extended to higher orders as shown in [Appendix A 

2.3. Growing blocks in real time 

The above description of the blocking algorithm presented the idea of using a 
logarithmic covering of the history integral. The formula [23] gives the number of 
blocks needed for the covering if we could perfectly choose their positions and widths. 
However, when solving the integro-differential equation, growing pains exist: the 
blocks have to be formed from the smaller blocks of the integration step size h. The 
block covering will thus be suboptimal. 

To determine how close to optimal the number of blocks will be, we perform a 
simulation. Here we choose a polynomially decaying function but with a more natural 
cut-off, i.e.. 



This will be fully equivalent to 1/ {t'Y with a cut-off AT = 1 and a shifted time 
t' =t + 1. Because of this, the optimal formula [23] will only hold for large T (or with 
AT = 1 and T replaced with T + 1). Within the simulation, we start at T = and 
create blocks of size h. At each step we check if neighboring blocks can be grouped by 
satisfying the condition [28] If they satisfy that condition, we group them and check 
the again with the next block. We perform this grouping from the smallest to the 
largest blocks, but the directionality of growth does not matter. Figure [2] shows how 
the number of blocks grow in practice. Although suboptimal, it is still on the same 
order as the optimal K. How close it is to optimal is dependent on the step size and 
block parameter. 

2.4. Errors 

Given the blocking construction above, we can analyze how we expect the errors to 
behave versus both the number of blocks K and the computational cost. The first 
question that is natural to ask is what error do we make by the replacement of the 
integral [TT] with the approximation [15] This gives us an idea of the error incurred by 
keeping K blocks of history in a logarithmic covering. 




(29) 



a it) 



1 



(30) 




Figure 2. The number of blocks K versus T. The dotted (blue) curve shows the 
optimal case given bv l23l The dashed (green) curve shows the number of blocks 
in practice. The dot-dashed curves represent the asymptotic behaviour of the 
number of blocks versus time. These curves intersect the axis at T = 1 because 
of the cut-off AT = 1. The inset shows the two latter quantities plotted together 
with the linear growth of K for a typical numerical solution. The parameters in 
the evolution are the following: AT = 1, d = 0.016, and h = 0.002. 



To answer this first we consider just the integral 

where we have some frequency of oscillation uj and some power of the polynomial 
decay p. In the integration below we take p = 2. It is important to note that in 
demonstrating the algorithm with just the integral [311 and not an integro-diflferential 
equation, it is not possible to show the computational speedup. However, it can 
show the accuracy of the integration as a function of the logarithmic coverage of the 
integration region when using the algorithm. We can also use it to get an idea of how 
the error changes when we vary uj or p without having to solve the more complicated 
integro-diflFerential equations. The form of this integral was chosen because of its 
relevance to the real-time simulation of quantum systems, where the integral of the 
integro-differential equation will be a sum over oscillating terms times a polynomial 
memory kernel. 

We gave a simple error bound above. To get a better idea of errors, we can 
examine the behaviour of the integral [31] as a function of the blocking parameter, 
frequency, and the integration range (e.g., simulation time). If one takes the highest 
order error of the expansion [141 the worse case error in a block will occur when there 
is approximately one oscillation in the block, 

27r , , 

bn^—, 32 

UJ 

as this is when the asymmetry contributes most. However, this contribution will only 
occur in some finite region of the integral because the block sizes vary. When one 
changes the block parameter, this error will be shifted further back in time, e.g., to 
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the left in Figure [H The memory kernel has a smaller value at this past time, and 
thus the error will be decreased. If one has many frequencies contributing to the 
integrand, then each of their error contributions will be shifted to a later time as one 
decreases the block parameter. Thus, the error will be systematically reduced. The 
same conclusion can be drawn if one expands both the functions in the integrand 
around their midpoint for each block. The fractional error (the highest order error 
term as a fraction of the zeroth order term) in this case is 

— {pip + 1) - 2iLupn - wVf) . (33) 
The expansion will be invalid at approximately 

which gives bxi 5/uj. 

We compare the logarithmic covering to the midpoint method with step sizes 
h = 10~^ * V with j = 1, . . . , 9. This gives a number of blocks ifmirf = Tfjh. For the 
algorithm, we use a step size h = 10"'^ to compute the integrals [25] and then use block 
parameters 6 = 10"'^ * 2-' with j = 1, . . . , 8 to compute [TH The number of blocks Kaig 
is then given approximately by [231 It is only approximate because each block width 
has to be rounded down to a multiple of /i, this gives a larger number of blocks. We 
use this larger number of blocks as the value for Kaig ■ 

If we are just looking at the integral [SU we expect the error due to the algorithm 
to be 



i?aig(xX.,g6^«-^(^ln^j (35) 
so long as 5T ^ 27r/ijj. This can be compared to the error of the midpoint method 

Emid oc Kraidh^ ~ -772 — ' (36) 

mid 

Thus, if we want a fixed error, E, and end time, we get a difference in the number of 
blocks of 

1/2 

Krmd « ( ^ ) (37) 

compared to 





Kaig « ^— ^ , (38) 



which is quite large for long times. 

Figure[3| shows how the error behaves versus the number of blocks used to compute 
the integral. The fast decrease of error as a function of Kaig shows that we can 
block together integrals as we evolve the integro-differential equation and have a 
controlled error. Further, the longer the simulation time, Tf, the better performing the 
algorithm should be when compared to the midpoint method. In addition, the overall 
performance of the algorithm should not be significantly affected by the frequency of 
integrand oscillation oj or the power p. One interesting feature of the error versus Kaig 




Figure 3. Absolute error versus number of blocks for the two methods of 
calculating the integral [31] with lu = 2tt and p = 2 (similar curves were found 
for other w). The solid (red) curves are for the midpoint method and the dashed 
(blue) curves are for the first order blocking algorithm. The main figure shows the 
error for Tf = 100 and the inset shows it for Tf = 10. Since the midpoint method 
is second order, we find what is expected, E oc K~^. The blocking algorithm 
has the same dependence on K. The difference in error for the same number of 
blocks with the two methods is dependent on T/lnT, refiecting the fact that the 
algorithm chooses blocks at optimal places and only covers the integration area 
with InT blocks. We take as the exact value the one computed with a step size 
h = 10-4. 



figure is the slight dip in the error at approximately bT ^ 27r/a;, which represents the 
error discussed above being pushed out of the integration region. 

The second question we can ask is how the error depends on the computational 
cost compared to more traditional methods. If we use the method for integro- 
differential equations outlined in [Appendix Bt we have an error at each step of 
and we have N steps. Thus 

Ermd (xNh^ (X-^, (39) 

where Cpu oc N"^ is the computational cost and we hold the simulation time fixed. For 
the algorithm 

Ealg (X (40) 

where we hold the step size h fixed and Cpu = NK . That is, the error goes down with 
a higher power in the computational cost. 

Of course, using the algorithm is not the only way to get the error to scale better 
with the computational cost. One can also just block the history with constant blocks 
larger than h. Although the error does scale better, the error can never be lower than 
just using the brute force method, e.g., the error versus cost curve will never cross the 
brute force curve. For the algorithm these two curves do cross, as we will see below. 
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3. Examples 

We consider two example problems to demonstrate the performance of the algorithm: 
(1) solving an integro differential equation that has both an oscillating function and 
a polynomially decaying function in the integral and (2) the NIBA equations for the 
spin-boson model. 



3.1. Oscillating integro- differential equation 

The oscillating integro-differential equation we use is 

h (t) = iLon it) - [ dt'a (t - t') n (t') (41) 
Jo 

with 

a{t) = . (42) 

We get rid of the oscillation in the differential equation by setting 

P (t) = e-"^'n (t) , (43) 

which gives 

P{t) = - dt'a{t-t')e-"^('-'')p{t') . (44) 



Jo 

This integro-differential equation involves the computation of an integral similar to 
[3T| with p = 2, at each time step. With the algorithm, we will be able to recycle the 
integral with the logarithmic covering at each step, rather than computing it from 
scratch. Note that we choose blocks by b{t + 1) here because we have the cut-off 
included in the polynomial decay. The simulation results for one set of parameters are 
shown in Figure |4l 

Now we examine the error versus computational cost. Figure [5] is one of the main 
results of this paper. It shows three key results: (1) The algorithm has significantly 
better error for the same computational cost compared to a brute force method in a 
large range of parameter space. (2) As the simulation time is increased, the efficiency of 
the algorithm drastically increases compared to the brute force method. (3) The figure 
also suggests how to use the whole ansatz. Rather than have two parameters b and h, 
one should set 6 oc /i (for polynomial decay). This will give a similar convergence to 
the exact answer as the second order method, but a smaller factor of proportionality. 

As mentioned in subsection 12.41 one can also consider another method that just 
uses a constant grouping to reduce the number of blocks in the history. However, this 
method can not achieve better results as determined by the error versus computational 
cost. For a given step size, the highest accuracy will be achieved by using the brute 
procedure. A lower computational cost can be achieved by grouping the history into 
constant blocks, but the reduction in the computational cost will yield an even stronger 
loss in accuracy because E oc , when one step size is fixed. Thus, a similar figure 
to Figure [5] would show curves moving only upward from the brute force line and not 
crossing it. 
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Figure 4. Real time evolution of|44]witli lj = 2it. The solid (blue) curve shows 
the real part of P (t) and the dashed (red) curve the imaginary part. These 
curves were calculate with the algorithm with h = 0.002 and d = 0.016. The 
crosses represent the solution using the midpoint procedure of [Appendix B| with 
a step size oi h = 0.001. The inset shows a blow up of the initial time where there 
is some transient structure in the solution. 



3.2. NIB A equations 



The NIBA for the spin-boson model gives a good physical example for a system with 
a polynomially decaying kernel. The NIBA equations are derived by Leggett et al fE\, 
see also [Appendix C| for a simple derivation, from which we can get the equation of 
motion 



dt'f{t-t')p{t') 



where 



f{t) = A^cos [2Q!tan~4] 

Time is in units of the cut-off frequency ijJ~ 
becomes 



{TTt//3uJc) CSCh {TTt//3uJc) 



(1+t 



2^1/2 



2a 



(45) 



(46) 



/(i) = 



cos [2a tan" ^i] 



At zero temperature (/3 — > oo) the kernel 



(47) 



(l + t2)" 

The 1 + t'^ has a physical cut-off to the low time behaviour of the polynomial decay. 
There are only two parameters of interest to us. We set A = 0.2 as it is required to be 
small (see [Appendix C| ) and we vary the dissipation strength a to see the behaviour 
of P (t) on different sides of the dissipative transition. Varying a also shows the 
algorithm at work for different powers of polynomial decay. 

Depending on the value of the dissipation strength a, we have to use a different 
cut-off AT. For a = 0.5, 1.0, and 1.5 we use AT = 1, 3, and 2, respectively. After 
these cut-offs, / (t) is increasingly smooth (it is also smoother than bare polynomial 
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Figure 5. Absolute error versus computational cost. The error is computed by 
comparing the simulation results with those the brute force method with h = 0.001 
for Tf = 50 (in the inset, the "exact" solution is taken as the brute method with 
h = 0.0002 and Tf = 10). Circles represent the brute force method for step 
sizes 0.1, 0.05, 0.025, 0.01, and 0.005 (in the inset, 0.1, 0.05, 0.025, 0.01, 0.005, 
and 0.001). The other data represents using the algorithm with the step size 
as shown and each data point has a different block parameter starting with 2h 
and increasing by factors of 2. For the longer simulation time of Tf = 50, the 
algorithm has quite a substantial decrease in error for the same computational 
cost (for the shorter time in the inset, one can see that there is barely a gain. 
This is due to the overhead cost in implementing the algorithm). Also, the curves 
have the behaviour discussed in subsection 12.41 We perform all computations on 
a P4, 3 GHz processor. 



decay) . We want to point out that due to the form of the polynomial decay, the most 
efficient way to choose block sizes is by 6 (l + t^) /t, which just comes from the inverse 
of the increasing smoothness. Despite this, we still use the less efficient bt. 

The simulation results for the NIBA equations are shown in Figure [H Using 
the algorithm allows one to have long simulation times for negligible computational 
cost. Further, simulations of NIBA-like equations on a lattice or with time dependent 
Hamiltonians can be done efficiently with the method. 

We want to emphasize that the simulation of the NIBA equations at finite 
temperature |46] can also use the algorithm presented in this paper. The finite 
temperature, however, introduces a long-time cut-off into the kernel, beyond which the 
kernel may not be increasingly smooth. The contribution beyond this cut-off, though, 
is negligible. Thus, the algorithm can be used for times less than ~ /3 exactly how it 
is used for the zero temperature case, and there can be a truncation of the kernel at 
times beyond times ~ (3. 

4. Conclusions 

We have given an efficient and stable numerical algorithm for solving integro- 
differential equations with increasingly smooth memory kernels. The general 
computational speedup is — > TC (T), where C (T) depends on how fast the kernel 
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Figure 6. NIBA equations simulated in real time for a = 0.5, 1.0, and 1.5. The 
simulations were done with h = 0.008 and d = 0.016. They took approximately 
two minutes of CPU time on a P4, 3 GHz processor. This is orders of magnitude 
faster than a brute force approach would take to get the same error. From 
the figure we can clearly observe the physics of the different regimes. For weak 
dissipation, one gets decay to P = 0. For the transition, a = 1, one gets 1/t^ 
decay of P(t). Above the transition, P (t) gets locked into a nonzero value. 



gets smooth. For example, the computational cost of the algorithm for polynomially 
decaying kernels is Tin T rather than the usual . 

Using a simple integro-differential equation, we demonstrated how well the 
algorithm performs compared to a second order, constant step size method. For 
long times, there is quite a substantial speedup in the computation cost to achieve 
a given error. The solution to the NIBA equations for the spin-boson model in 
real-time showed that one can get results and analyze the situation quite rapidly. 
Similar procedures can be applied to other forms of memory kernels which satisfy the 
increasingly smooth condition [71 In these other cases, the computational speedup can 
be better or worse than the case presented here. 

In practice, the usefulness of this algorithm is due to the possibility of its 
incorporation into other techniques, such as matrix product state algorithms, or its 
use with time-dependent Hamiltonian terms. Thus, one can simulate lattices or driven 
systems subject to strong dissipation. The algorithm can also be used for equations 
less restrictive than [21 such as integro-differential equations with memory kernels 
dependent on both time arguments. 
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Appendix A. Higher order blocking algorithms 

The algorithm above can be extended to higher orders. For instance, if we want the 
next order method, we need to be able to evaluate and update 

/I (T, AT) 

= /" (T, AT) + V a' (r,) / deF (r, + e, T) e . (A.l) 

The latter term picks out the asymmetric part of F inside of block i and then multiphes 
it by the derivative of a. Lets define this asymmetric part of the block as 

/■+<5i/2 

A,{T)= \ deF{n + e,T)e. (A.2) 

If we suppose we have these asymmetric integrals at a given time, and we want to 
increment T, we first need to fix the block size, as before. Then we can update them 

by 

A,{T + h) = e-^'-^'^A,{T) (A.3) 
We also need to be able to add these blocks together. We can do this by 

p+B"-'""/2 

j^new ^rj.^ ^ / ^^nen, ^^^T)e (A.4) 

= A, (T) - B,+iI, (T) + A+i (T) + B,h+i (T) 

where we use the asymmetric integrals from before we did the blocking and also the 
first order integrals [25] for the two blocks. The error for an order z algorithm will have 
a bound proportional to 



Appendix B. Two stage Runge-Kutta method 



In this article, we compare the proposed numerical ansatz to a two stage Runge-Kutta 
method. Since we are deahng with integro-differential equations, we give the details 
of the second order technique we use. In the main text, we discuss how the errors 
scale with total simulation time, step size, and computational cost. 

For all the examples in this work, the basic integro-differential equation O is 
reduced to 

dt'a{t-t')e-^('-*')p{t') , 



dp{t) 



e.g., /C,/C' =1. In a generic form we can write 



dpjt) 
dt 



I 



t,p{t), / dt'T{t-t')p{t') 



Discretizing time as 

tn = to + nh , 

we can write a general two stage Runge-Kutta integration scheme 

Pn+l = Pn + h dpn , 



(B.l) 

(B.2) 

(B.3) 
(B.4) 



dpn = / [t,Pn,Zn] , 



{B.5) 
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and 



where 



and 



Pn = Pn + dP^ , (B.6) 
dPn = ^f[t,Pn,Zn] , (B.7) 

Zn^h^ fnm [Pm + Pm+l) /2 , (B.8) 



h ^mn {Pm + Pm+l) /2 

m— 1 

+ [Pn + P„) /2 , (B.9) 



^nm^-^{^{tn-t'))lz =Jll dt' T {tn - t') (B.IO) 



2 



^0 = T / dt'T{t„-t') . (B.ll) 

Although using the average Tnm over an interval does not increase the order of 
the method, it does preserve important properties of the kernel such as its total 
integration. This is very important in cases where the kernel integrates to zero and 
thus the transient behaviour completely determines the steady state. We use the 
average of the kernel over each block with the algorithm as well. The Runge-Kutta 
scheme can of course be generalized to higher stages and to kernels with two time 
arguments. O [HI [H [IS] 



Appendix C. Simple derivation of the NIBA equations 

In this appendix we give a simple derivation of the NIBA equations for the spin- 
boson model to show how polynomially decaying memory kernels can arise physically 
in the case of strong dissipation. The derivation of the NIBA equations is based 
on the observation that they come from a Born approximation of a transformed 
Hamiltonian . [20j 

The spin-boson Hamiltonian is 

HsB = ~\^'^^ + ^^^^ + X! ^kal^k + l<yz X] 9k {a\ + (^^-l) 

k k 

where we have a two level system with internal coupling constant A and bias e. The two 
level system is coupled to an collection of bosons of frequencies {tOk} with a coupling 
constant gk = Ck/ V^niktJk and overall coupling factor 7. The spectral density of the 
bath is given by 

J{u;) = 7rY,9lH^-^k) . (C.2) 

k 
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The physical scenario we want to explore is one in which the two level system is 
initially held fixed in an eigenstate of Uz while the bath equilibrates around this state. 
That is, the bath will equihbrate around the Hamiltonian 

Hb. = '^ uJka-lak + ^ 7.gfc (^al + a^^ (C.3) 

k k 

if we hold the two level system in the +1 eigenstate of az- This gives the thermal 
starting state for the bath 

Ro - ^— . (C.4) 

Then, we want to release the two level system and follow its dynamics in real time. 
In particular, we look at the expectation value of cTz 

P{t)^{az)t (C.5) 
= ti {e'"^^'aze-'"^^'\l){l\® Ro} . (C.6) 

Since we are interested in strong dissipation, we can perform a canonical 
transformation on this Hamiltonian to incorporate all orders of the system-bath 
interaction. With 



we get 



H = e^Hsse-^ (C.8) 
iA((7+B_+a_i3+) + ie 



- ((7+B_ + a-B+) + iea, + ^ w^flt (C.9) 



where 



(C.IO) 



For the unbiased case, e = 0, this gives the interaction picture Hamiltonian 



with 



H' (t) - -iA {a+Bl (t) + a_Bi (t)) (C.ll) 

Bi (t) = exp |±2 ^ ^ (afce-^'"'^'* - e''"'=*) | . (C.12) 

We can then transform the equation for P (t) to get 

P(t) = tr{e'^V,e-*^*|l)(l|®i?[)} (C.13) 

where 

R!^^e-<^^^'^^<''^ jZ'^. (C.14) 

We can find the master equation in the Born approximation for P (t), also known 
as the noninteracting blip approximation, by performing perturbation theory on lC.13l 
To second order in A, 

P(t) = - / dt'j(t - t')P{t') (C.15) 
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with 

fit) =^{{[Biit),BL (0)])^. 

+ {[Bi{-t),BL{0)])n'J (C.16) 

= ^{[Biit),BLiO)])R'^ (C.17) 

where we have used that the correlation functions are equal. To compute / (t) we can 
use the Feynman disentangling of operators |21j to get, in the notation of Leggett et 
al El, 



with 



and 



/ it) = A2 cos I^Qi exp l^il^Q^ (t)| 

Qi (t) = / dcj— V^sin(tjt) 
Jo ^ 



(C.18) 
(C.19) 



Q2{t)^J^ dij'-^^ [1- cos ujt) coth (^^^ . (C.20) 

For ohmic dissipation, J (uj) = rjojcxp {—uj/lUc), these quantities become 

Qi (t) ^ T] tan^^ LUct (C.21) 

and 

■ln(l+w^i2) + ,^ln^ 
With a = 2?77^/7r, this gives l46l for the NIB A memory kernel 



'2 (t) = I In (1 + u;lt') + In sinh . (C.22) 



References 



[1] Feynman R P and Vernon F L 1963, Ann. Phys. - New York 24 118 
[2] Keldysh L V 1965 Sov. Phys. JETP 20 1018 

[3] Kadanoff L P and Baym G 1962 Quantum Statistical Mechanics (W. A. Benjamin, Inc., New 
York) 

[4] Breuer H -P and Petruccione F 2002 The Theory of Open Quantum Systems (Oxford University 
Press, Oxford) 

[5] Leggett A J, Chakravarty S, Dorsey A T, Fisher M P A, Garg A, and Zwerger W 1987 Rev. 

Mod. Phys. 59 1 
[6] Schon G and Zaikin A D 1990 Phys. Rep. 198 237 

[7] Gushing J M 1977 Integrodifferential equations and delay models in population (Springer- Verlag, 
Berlin) 

[8] Zwolak M and Vidal G 2004 Phys. Rev. Lett. 93 207205 

[9] Verstraete F, Garcia-RipoU J J, and Cirac J I 2004 Phys. Rev. Lett. 93 207204 
[10] Vidal G 2003 Phys. Rev. Lett. 91 147902 
[11] Vidal G 2004 Phys. Rev. Lett. 93 040502 
Jl2] Zwolak M and Refael G 2007 in preparation 
113] Shabani A and Lidar D A 2001 Phys. Rev. A 71 020101(R) 
114] Barnett S M and Stenholm S 2001 Phys. Rev. A 64 033808 

115] Daffer S, Wodkiewicz K, Gresser J D, and Mclver J K 2004 Phys. Rev. A 70 010304(R) 
116] Leathers A S and Micha D A 2005 Chem. Phys. Lett. 415 46 

117] Brunner H and van der Houwen P J 1986 The Numerical Solution of Volterra Equations (North- 
Holland, New York) 

118] Linz P 1985 Analytical and Numerical Methods for Volterra Equations (SIAM, Philadelphia) 
119] Lubich Ch 1982 Numer. Math. 40 119 

[20] Aslangul C, Pottier N, and Saint-James D 1986 J. Physique 47 1657 

121] Mahan G 2000 Many-Particle Physics (Kluwer Academic/Plenum Publishers, New York) 



